Alpha distribution (alpha)#
The Alpha distribution in SciPy (scipy.stats.alpha) is a continuous distribution on \((0,\infty)\) built from a simple transformation of a truncated standard normal.
Despite the generic name, this is not “the alpha parameter” from other contexts (e.g., Dirichlet/Beta). It’s its own distribution with a distinctive \(1/x^2\) tail, which implies that the mean and variance are infinite.
Learning goals#
Write down the PDF/CDF and understand where they come from.
Build intuition via the truncated-normal generative story.
Understand which moments exist (and which diverge).
Sample from the distribution with a NumPy-only algorithm.
Use
scipy.stats.alphafor evaluation and fitting.
import numpy as np
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
import scipy
from scipy import optimize
from scipy.stats import alpha as alpha_dist
from scipy.stats import norm
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=5, suppress=True)
rng = np.random.default_rng(42)
# Record versions for reproducibility (useful when numerical details matter).
VERSIONS = {"numpy": np.__version__, "scipy": scipy.__version__, "plotly": plotly.__version__}
1) Title & Classification#
Name:
alpha(Alpha distribution; SciPy:scipy.stats.alpha)Type: Continuous
Support (standard form): \(x \in (0,\infty)\)
Parameter space (standard form): shape \(a > 0\)
SciPy location/scale:
loc \in \mathbb{R},scale > 0with $\(X = \text{loc} + \text{scale}\,Y, \qquad Y \sim \mathrm{Alpha}(a).\)$
Unless stated otherwise, this notebook works with the standard form (loc=0, scale=1).
2) Intuition & Motivation#
What it models#
A helpful way to think about the Alpha distribution is through its generative story:
Draw a standard normal random variable \(Z \sim \mathcal{N}(0,1)\).
Condition it to be below a threshold: \(Z \mid (Z \le a)\).
Convert “distance to the threshold” into a positive quantity and take a reciprocal: $\(X = \frac{1}{a - Z}.\)$
If \(Z\) lands very close to \(a\) from below, then \((a-Z)\) is tiny and \(X\) becomes huge. That rare-but-possible event produces a heavy right tail.
Typical real-world use cases#
The Alpha distribution appears in the reliability literature as a model for positive quantities with occasional extreme values (e.g., lifetimes, repair times, stress/strength-like ratios after a transformation). It is most appropriate when:
values are strictly positive;
the right tail can be very long (outliers are not just noise);
you want a model with a clear “threshold proximity” interpretation via the truncated normal story.
Relations to other distributions#
Truncated normal: \(Z \mid (Z \le a)\) is the core latent variable.
Reciprocal transform: \(X\) is a reciprocal of \((a-Z)\).
Pareto-like tail: the Alpha PDF behaves like \(\text{const} \cdot x^{-2}\) for large \(x\), which implies \(\mathbb{P}(X>x) \approx \text{const} \cdot x^{-1}\). This is heavy enough that \(\mathbb{E}[X]\) diverges.
3) Formal Definition#
Let \(\phi\) and \(\Phi\) denote the standard normal PDF and CDF:
PDF#
For \(x>0\) and \(a>0\), the Alpha distribution has PDF
CDF#
For \(x>0\),
and \(F(x;a)=0\) for \(x\le 0\).
Quantile function (PPF)#
Solving \(q = F(x;a)\) for \(x\) gives
This matches SciPy’s internal implementation of alpha.ppf.
def alpha_pdf(x: np.ndarray, a: float) -> np.ndarray:
"""PDF of the standard Alpha(a) distribution (loc=0, scale=1)."""
x = np.asarray(x, dtype=float)
out = np.zeros_like(x, dtype=float)
mask = x > 0
xa = x[mask]
out[mask] = norm.pdf(a - 1.0 / xa) / (xa**2 * norm.cdf(a))
return out
def alpha_logpdf(x: np.ndarray, a: float) -> np.ndarray:
"""Log-PDF of the standard Alpha(a) distribution (more stable in the tail)."""
x = np.asarray(x, dtype=float)
out = np.full_like(x, -np.inf, dtype=float)
mask = x > 0
xa = x[mask]
out[mask] = norm.logpdf(a - 1.0 / xa) - 2.0 * np.log(xa) - np.log(norm.cdf(a))
return out
def alpha_cdf(x: np.ndarray, a: float) -> np.ndarray:
"""CDF of the standard Alpha(a) distribution."""
x = np.asarray(x, dtype=float)
out = np.zeros_like(x, dtype=float)
mask = x > 0
xa = x[mask]
out[mask] = norm.cdf(a - 1.0 / xa) / norm.cdf(a)
return out
def alpha_ppf(q: np.ndarray, a: float) -> np.ndarray:
"""Quantile function (inverse CDF) of the standard Alpha(a) distribution."""
q = np.asarray(q, dtype=float)
return 1.0 / (a - norm.ppf(q * norm.cdf(a)))
# Sanity check: our formulas match SciPy.
a = 1.7
x = np.logspace(-3, 2, 25)
assert np.allclose(alpha_pdf(x, a), alpha_dist.pdf(x, a))
assert np.allclose(alpha_cdf(x, a), alpha_dist.cdf(x, a))
assert np.allclose(alpha_ppf(np.linspace(0.01, 0.99, 9), a), alpha_dist.ppf(np.linspace(0.01, 0.99, 9), a))
4) Moments & Properties#
Tail behavior#
As \(x\to\infty\), we have \(a - 1/x \to a\), so
Consequently,
This is a power-law tail with exponent 1 for the survival function.
Mean / variance / skewness / kurtosis#
Because the tail is so heavy:
\(\mathbb{E}[X] = \infty\) (diverges logarithmically)
\(\mathrm{Var}(X) = \infty\) (since \(\mathbb{E}[X^2] = \infty\))
skewness and kurtosis are undefined (they require finite third/fourth central moments)
More generally, the power-law tail implies:
\(\mathbb{E}[X^p]\) is finite for \(p < 1\)
\(\mathbb{E}[X^p]\) diverges for \(p \ge 1\)
MGF / characteristic function#
The moment generating function \(M_X(t)=\mathbb{E}[e^{tX}]\) does not exist for any \(t>0\).
The Laplace transform \(\mathbb{E}[e^{tX}]\) does exist for \(t<0\) (and can be computed numerically).
The characteristic function \(\varphi_X(t)=\mathbb{E}[e^{itX}]\) exists for all real \(t\) because \(|e^{itX}|\le 1\).
A useful integral representation comes from the truncated-normal story. If \(Z\sim\mathcal{N}(0,1)\) conditioned on \(Z\le a\) and \(X=1/(a-Z)\), then
Entropy#
The (differential) entropy is
which is finite and available via scipy.stats.alpha.entropy.
a = 1.0
mean, var, skew, kurt = alpha_dist.stats(a, moments="mvsk")
entropy = alpha_dist.entropy(a)
print("SciPy stats (a=1.0):")
print(" mean =", mean)
print(" var =", var)
print(" skew =", skew)
print(" kurt =", kurt)
print(" entropy=", entropy)
qs = [0.5, 0.9, 0.99]
print("\nSelected quantiles:")
for q in qs:
print(f" q={q:>4}: {alpha_dist.ppf(q, a):.5f}")
# Empirical mean is unstable (finite for finite samples, but does not converge).
print("\nEmpirical summaries (same a, increasing n):")
for n in [200, 2_000, 20_000]:
x = alpha_dist.rvs(a, size=n, random_state=rng)
print(
f" n={n:>6}: mean={x.mean():.3f}, median={np.median(x):.3f}, 95%={np.quantile(x, 0.95):.3f}, max={x.max():.3f}"
)
SciPy stats (a=1.0):
mean = inf
var = inf
skew = nan
kurt = nan
entropy= 1.1728113403610725
Selected quantiles:
q= 0.5: 0.83321
q= 0.9: 3.30422
q=0.99: 29.25150
Empirical summaries (same a, increasing n):
n= 200: mean=1.584, median=0.803, 95%=4.951, max=38.214
n= 2000: mean=3.025, median=0.824, 95%=6.602, max=497.999
n= 20000: mean=5.053, median=0.841, 95%=6.204, max=16423.937
5) Parameter Interpretation#
The single shape parameter \(a>0\) plays two linked roles:
It sets the truncation point for the latent normal: \(Z\mid(Z\le a)\).
It appears in the reciprocal transform \(X = 1/(a-Z)\).
Intuitively:
Larger \(a\) increases the typical size of \((a-Z)\), so \(X\) tends to be smaller.
Smaller \(a\) makes it easier for \(Z\) to land near \(a\) (from below), producing more extreme \(X\) values.
The tail constant is proportional to \(\phi(a)/\Phi(a)\); this ratio decreases rapidly as \(a\) grows, so the far tail becomes less prominent for large \(a\) (even though the asymptotic power \(x^{-2}\) remains).
x = np.logspace(-3, 2, 600)
a_values = [0.5, 1.0, 2.0, 4.0]
fig = go.Figure()
for a in a_values:
fig.add_trace(go.Scatter(x=x, y=alpha_pdf(x, a), mode="lines", name=f"a={a}"))
fig.update_layout(
title="Alpha PDF for different a (log x-axis)",
xaxis_title="x",
yaxis_title="f(x; a)",
)
fig.update_xaxes(type="log")
fig.show()
# Tail view: survival function on log-log axes (highlights the ~1/x behavior).
x_tail = np.logspace(-1, 3, 600)
fig = go.Figure()
for a in a_values:
sf = 1.0 - alpha_cdf(x_tail, a)
fig.add_trace(go.Scatter(x=x_tail, y=sf, mode="lines", name=f"a={a}"))
fig.update_layout(
title="Alpha survival function on log-log axes",
xaxis_title="x",
yaxis_title="P(X > x)",
)
fig.update_xaxes(type="log")
fig.update_yaxes(type="log")
fig.show()
x = np.logspace(-3, 2, 600)
a_values = [0.5, 1.0, 2.0, 4.0]
fig = go.Figure()
for a in a_values:
fig.add_trace(go.Scatter(x=x, y=alpha_cdf(x, a), mode="lines", name=f"a={a}"))
fig.update_layout(
title="Alpha CDF for different a (log x-axis)",
xaxis_title="x",
yaxis_title="F(x; a)",
)
fig.update_xaxes(type="log")
fig.show()
6) Derivations#
6.1 Deriving the CDF and PDF#
Start with a truncated standard normal:
Its density is
Define \(X = 1/(a-Z)\). For \(x>0\),
Differentiating with respect to \(x\) gives
6.2 Why the mean and variance diverge#
Use the tail asymptotic \(f(x;a) \sim C/x^2\) with \(C=\phi(a)/\Phi(a)\).
For the mean: $\(\mathbb{E}[X] = \int_0^{\infty} x\,f(x;a)\,dx \ \text{has integrand}\ \sim C/x,\)\( and \)\int^\infty (1/x),dx$ diverges (logarithmically).
For the second moment: $\(\mathbb{E}[X^2] = \int_0^{\infty} x^2\,f(x;a)\,dx \ \text{has integrand}\ \sim C,\)\( and \)\int^\infty 1,dx$ diverges.
So \(\mathbb{E}[X]\) and \(\mathrm{Var}(X)\) are infinite.
6.3 Likelihood (i.i.d. sample)#
Given data \(x_1,\dots,x_n\) with \(x_i>0\), the likelihood for \(a\) (standard form) is
The log-likelihood is
Differentiating (using \(\tfrac{d}{da}\log\Phi(a)=\phi(a)/\Phi(a)\) and \(\tfrac{d}{da}\log\phi(u)= -u\)) gives the score:
Setting \(\ell'(a)=0\) yields a nonlinear equation in \(a\) (because of \(\phi(a)/\Phi(a)\)), so maximum likelihood estimation typically uses numerical methods.
def alpha_loglik(a: float, x: np.ndarray) -> float:
x = np.asarray(x, dtype=float)
if a <= 0 or np.any(x <= 0):
return -np.inf
# Uses SciPy's stable norm.logpdf implementation.
return float(np.sum(norm.logpdf(a - 1.0 / x) - 2.0 * np.log(x)) - x.size * np.log(norm.cdf(a)))
def alpha_score(a: float, x: np.ndarray) -> float:
x = np.asarray(x, dtype=float)
if a <= 0 or np.any(x <= 0):
return np.nan
return float(-x.size * (norm.pdf(a) / norm.cdf(a)) - np.sum(a - 1.0 / x))
# MLE demo in the standard form (loc=0, scale=1) via a grid.
a_true = 1.8
x = alpha_dist.rvs(a_true, size=3_000, random_state=rng)
a_grid = np.linspace(0.05, 6.0, 500)
ll = np.array([alpha_loglik(a, x) for a in a_grid])
a_hat_grid = a_grid[np.argmax(ll)]
fig = go.Figure(go.Scatter(x=a_grid, y=ll - ll.max(), mode="lines"))
fig.add_vline(x=a_true, line_dash="dash", line_color="green", annotation_text="true a")
fig.add_vline(x=a_hat_grid, line_dash="dash", line_color="red", annotation_text="grid MLE")
fig.update_layout(
title="Log-likelihood (centered) for a (standard form)",
xaxis_title="a",
yaxis_title="log L(a) - max_a log L(a)",
)
fig.show()
# Compare to SciPy's fit when loc/scale are fixed.
a_hat_scipy, loc_hat, scale_hat = alpha_dist.fit(x, floc=0, fscale=1)
print("True a =", a_true)
print("Grid MLE =", a_hat_grid)
print("SciPy fit =", a_hat_scipy)
print("(loc,scale)=", (loc_hat, scale_hat))
True a = 1.8
Grid MLE = 1.8385771543086173
SciPy fit = 1.836523437500002
(loc,scale)= (0, 1)
7) Sampling & Simulation#
NumPy-only algorithm#
Use the truncated-normal representation:
Sample \(Z \sim \mathcal{N}(0,1)\) until \(Z\le a\) (rejection sampling for a one-sided truncation).
Transform \(X = 1/(a-Z)\).
Because \(a>0\), the acceptance probability is \(\mathbb{P}(Z\le a)=\Phi(a)\ge 1/2\), so this rejection sampler is typically efficient.
We implement it below using only NumPy.
def alpha_rvs_numpy(a: float, size=1, *, rng: np.random.Generator | None = None) -> np.ndarray:
"""Draw samples from Alpha(a) using only NumPy.
Algorithm:
- sample Z ~ N(0,1) until Z <= a (one-sided truncation)
- return X = 1/(a - Z)
"""
if a <= 0:
raise ValueError("a must be > 0")
rng = np.random.default_rng() if rng is None else rng
size_tuple = (size,) if np.isscalar(size) else tuple(size)
n = int(np.prod(size_tuple))
out = np.empty(n, dtype=float)
filled = 0
while filled < n:
# Oversample to reduce loop overhead.
m = max(256, 2 * (n - filled))
z = rng.normal(size=m)
z = z[z <= a]
if z.size == 0:
continue
take = min(z.size, n - filled)
out[filled : filled + take] = 1.0 / (a - z[:take])
filled += take
return out.reshape(size_tuple)
8) Visualization#
We’ll compare:
the theoretical PDF and CDF
Monte Carlo samples (NumPy-only sampler)
SciPy’s implementation
a = 1.0
n = 50_000
x_np = alpha_rvs_numpy(a, size=n, rng=rng)
x_sp = alpha_dist.rvs(a, size=n, random_state=rng)
# Histogram vs PDF
x_grid = np.logspace(-3, 2, 500)
pdf_grid = alpha_pdf(x_grid, a)
fig = px.histogram(
x=x_np,
nbins=120,
histnorm="probability density",
title="Monte Carlo histogram (NumPy-only) vs theoretical PDF",
labels={"x": "x"},
)
fig.add_trace(go.Scatter(x=x_grid, y=pdf_grid, mode="lines", name="theoretical PDF"))
fig.update_xaxes(type="log")
fig.show()
# Empirical CDF vs theoretical CDF
x_sorted = np.sort(x_np)
ecdf = np.arange(1, n + 1) / n
cdf_grid = alpha_cdf(x_grid, a)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_sorted, y=ecdf, mode="lines", name="empirical CDF (NumPy-only)"))
fig.add_trace(go.Scatter(x=x_grid, y=cdf_grid, mode="lines", name="theoretical CDF"))
fig.update_layout(title="CDF: empirical vs theoretical", xaxis_title="x", yaxis_title="F(x)")
fig.update_xaxes(type="log")
fig.show()
# Quick check that NumPy-only samples and SciPy samples look similar (KS statistic).
from scipy.stats import ks_2samp
ks = ks_2samp(x_np, x_sp)
print("KS two-sample test (NumPy vs SciPy samples):")
print(ks)
KS two-sample test (NumPy vs SciPy samples):
KstestResult(statistic=0.005519999999999969, pvalue=0.42984315222942004, statistic_location=1.0048665241757195, statistic_sign=1)
9) SciPy Integration#
scipy.stats.alpha provides the usual distribution API:
alpha.pdf(x, a, loc=0, scale=1)alpha.cdf(x, a, loc=0, scale=1)alpha.rvs(a, loc=0, scale=1, size=..., random_state=...)alpha.fit(data, ...)(MLE)
A common pattern is to freeze the distribution: rv = alpha(a, loc=..., scale=...), then call rv.pdf, rv.cdf, rv.rvs, etc.
a = 2.0
rv = alpha_dist(a) # frozen, standard form
x = np.array([0.1, 0.5, 1.0, 5.0])
print("pdf:", rv.pdf(x))
print("cdf:", rv.cdf(x))
samples = rv.rvs(size=5, random_state=rng)
print("rvs:", samples)
# Fitting (standard form): fix loc=0, scale=1 and estimate only a.
a_true = 1.5
data = alpha_dist.rvs(a_true, size=5_000, random_state=rng)
a_hat, loc_hat, scale_hat = alpha_dist.fit(data, floc=0, fscale=1)
print("\nFit (fixed loc/scale):")
print(" true a:", a_true)
print(" est a:", a_hat)
print(" (loc, scale):", (loc_hat, scale_hat))
pdf: [0. 1.63292 0.2476 0.00323]
cdf: [0. 0.51164 0.86093 0.98651]
rvs: [0.60403 0.28935 0.38106 0.65239 2.43316]
Fit (fixed loc/scale):
true a: 1.5
est a: 1.5148437500000016
(loc, scale): (0, 1)
10) Statistical Use Cases#
Hypothesis testing (goodness-of-fit)#
If you have a specified parameter \(a\) (not estimated from the same sample), you can test whether data plausibly comes from an Alpha distribution using a goodness-of-fit test such as Kolmogorov–Smirnov (KS).
Caveat: if you estimate \(a\) from the data and then run KS on the same data, the usual p-values are no longer exact (you need a corrected procedure or a bootstrap).
Bayesian modeling#
Because the likelihood \(p(x\mid a)\) is available in closed form, you can put a prior on \(a>0\) (e.g., Gamma) and perform Bayesian inference with generic tools:
Below we show a simple grid-based posterior computation.
Generative modeling#
The Alpha distribution can be used as a heavy-tailed positive noise model or as a component in a mixture model when you want strictly-positive data with occasional extremes.
Because the mean is infinite, summary statistics and loss functions that rely on the mean (e.g., squared error to the mean) can behave poorly; robust summaries (median/quantiles) are often more appropriate.
# Hypothesis testing example: KS test when a is known.
from scipy.stats import kstest
a = 1.2
x = alpha_dist.rvs(a, size=2_000, random_state=rng)
D, p_value = kstest(x, alpha_dist(a).cdf)
print("KS test against Alpha(a=1.2):")
print(" D =", D)
print(" p-value=", p_value)
KS test against Alpha(a=1.2):
D = 0.018166844743858906
p-value= 0.5181148318909714
# Bayesian modeling example: grid posterior for a with a Gamma prior.
from scipy.stats import gamma as gamma_dist
rng_local = np.random.default_rng(123)
a_true = 1.8
x = alpha_dist.rvs(a_true, size=800, random_state=rng_local)
# Prior: a ~ Gamma(k, theta) with support (0, inf)
k, theta = 2.0, 1.0
a_grid = np.linspace(0.05, 6.0, 800)
log_prior = gamma_dist(a=k, scale=theta).logpdf(a_grid)
log_like = np.array([alpha_loglik(a, x) for a in a_grid])
log_post_unnorm = log_like + log_prior
log_post = log_post_unnorm - np.max(log_post_unnorm)
post_unnorm = np.exp(log_post)
post = post_unnorm / np.trapz(post_unnorm, a_grid)
a_map = a_grid[np.argmax(post)]
fig = go.Figure(go.Scatter(x=a_grid, y=post, mode="lines"))
fig.add_vline(x=a_true, line_dash="dash", line_color="green", annotation_text="true a")
fig.add_vline(x=a_map, line_dash="dash", line_color="red", annotation_text="MAP")
fig.update_layout(
title="Posterior over a (Gamma prior + Alpha likelihood)",
xaxis_title="a",
yaxis_title="posterior density",
)
fig.show()
print("True a =", a_true)
print("MAP =", a_map)
True a = 1.8
MAP = 1.7925531914893618
# Generative modeling example: heavy-tailed positive "durations".
# In practice, prefer robust summaries (quantiles) over the mean.
a = 1.0
durations = alpha_rvs_numpy(a, size=80_000, rng=rng).ravel()
print("Summaries (a=1.0):")
print(" median =", float(np.median(durations)))
print(" mean =", float(durations.mean()))
print(" 99% =", float(np.quantile(durations, 0.99)))
print(" max =", float(durations.max()))
# Empirical CCDF on log-log axes; the far tail is close to ~const/x.
x_sorted = np.sort(durations)
n = x_sorted.size
ccdf = 1.0 - np.arange(1, n + 1) / n
x0 = float(np.quantile(x_sorted, 0.9))
mask = x_sorted >= x0
x_tail = x_sorted[mask]
ccdf_tail = ccdf[mask]
# Reference ~c/x line anchored at x0 using the empirical CCDF.
c0 = float(ccdf_tail[0] * x_tail[0])
ref = c0 / x_tail
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_tail, y=ccdf_tail, mode="lines", name="empirical CCDF (tail)"))
fig.add_trace(
go.Scatter(
x=x_tail,
y=ref,
mode="lines",
name="~c/x reference",
line=dict(dash="dash"),
)
)
fig.update_layout(
title="Generative example: empirical tail on log-log axes",
xaxis_title="x",
yaxis_title="P(X > x)",
)
fig.update_xaxes(type="log")
fig.update_yaxes(type="log")
fig.show()
Summaries (a=1.0):
median = 0.8381684906603133
mean = 4.219827793554541
99% = 30.191825495643975
max = 34695.03256416532
11) Pitfalls#
Invalid parameters: \(a\le 0\) is not allowed; the support is \(x>0\) in the standard form.
Infinite mean/variance: sample means can look “reasonable” for small \(n\) but are not stable estimators; prefer medians/quantiles.
Numerical issues in the tail:
use
logpdfwhen multiplying many densities or when \(x\) is extreme;ppf(q)for \(q\) extremely close to 1 can overflow because the true quantile is enormous.
Fitting sensitivity: because extreme values occur, MLE can be sensitive to outliers and optimizer settings; consider robust diagnostics (QQ plots, tail checks) and bootstrap uncertainty.
12) Summary#
The Alpha distribution (
scipy.stats.alpha) is a continuous distribution on \((0,\infty)\) with shape parameter \(a>0\).It can be generated by sampling a truncated normal \(Z\mid(Z\le a)\) and transforming via \(X=1/(a-Z)\).
Its right tail behaves like \(\text{const} \cdot x^{-2}\), implying \(\mathbb{E}[X]=\infty\) and \(\mathrm{Var}(X)=\infty\).
PDF/CDF/PPF have clean expressions in terms of the standard normal \(\phi\) and \(\Phi\).
Sampling is straightforward with a NumPy-only rejection sampler for the one-sided truncated normal.
References#
Johnson, Kotz, and Balakrishnan. Continuous Univariate Distributions, Volume 1 (2nd ed.), Wiley, 1994.
Salvia, A. A. “Reliability applications of the Alpha Distribution.” IEEE Transactions on Reliability, 1985.
SciPy documentation:
scipy.stats.alpha.